Search CORE

4 research outputs found

Joint morphological-lexical language modeling for processing morphologically rich languages with application to dialectal Arabic

Author: Afify Mohamed
Deng Yonggang
Erdogan Hakan
Erdoğan Hakan
Gao Yuqing
Sarıkaya Ruhi
Sarikaya Ruhi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Language modeling for an inflected language such as Arabic poses new challenges for speech recognition and machine translation due to its rich morphology. Rich morphology results in large increases in out-of-vocabulary (OOV) rate and poor language model parameter estimation in the absence of large quantities of data. In this study, we present a joint morphological-lexical language model (JMLLM) that takes advantage of Arabic morphology. JMLLM combines morphological segments with the underlying lexical items and additional available information sources with regards to morphological segments and lexical items in a single joint model. Joint representation and modeling of morphological and lexical items reduces the OOV rate and provides smooth probability estimates while keeping the predictive power of whole words. Speech recognition and machine translation experiments in dialectal-Arabic show improvements over word and morpheme based trigram language models. We also show that as the tightness of integration between different information sources increases, both speech recognition and machine translation performances improve

CiteSeerX

Crossref

Sabanci University Research Database

Semantic confidence measurement for spoken dialog systems

Author: Erdogan Hakan
Erdoğan Hakan
Picheny Michael
Sarıkaya Ruhi
Sarikaya Ruhi
Yuqing Gao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2005
Field of study

This paper proposes two methods to incorporate semantic information into word and concept level confidence measurement. The first method uses tag and extension probabilities obtained from a statistical classer and parser. The second method uses a maximum entropy based semantic structured language model to assign probabilities to each word. Incorporation of semantic features into a lattice posterior probability based confidence measure provides significant improvements compared to posterior probability when used together in an air travel reservation task. At 5% False Alarm (FA) rate relative improvements of 28% and 61% in Correct Acceptance (CA) rate are achieved for word level and concept level confidence measurements, respectively

Crossref

Sabanci University Research Database

S-vector: a discriminative representation derived from I-vector for speaker verification

Author: Erdoğan Hakan
Işık Yusuf Ziya
Sarıkaya Ruhi
Publication venue: EURASIP (European Association for Signal Processing)
Publication date: 01/01/2015
Field of study

Sabanci University Research Database

Using semantic analysis to improve speech recognition performance

Author: Chen Stanley F.
Erdogan Hakan
Erdoğan Hakan
Gao Yuqing
Picheny Michael
Sarıkaya Ruhi
Sarikaya Ruhi
Publication venue: 'Elsevier BV'
Publication date: 01/07/2005
Field of study

Although syntactic structure has been used in recent work in language modeling, there has not been much effort in using semantic analysis for language models. In this study, we propose three new language modeling techniques that use semantic analysis for spoken dialog systems. We call these methods concept sequence modeling, two-level semantic-lexical modeling, and joint semantic-lexical modeling. These models combine lexical information with varying amounts of semantic information, using annotation supplied by either a shallow semantic parser or full hierarchical parser. These models also differ in how the lexical and semantic information is combined, ranging from simple interpolation to tight integration using maximum entropy modeling. We obtain improvements in recognition accuracy over word and class N-gram language models in three different task domains. Interpolation of the proposed models with class N-gram language models provides additional improvement in the air travel reservation domain. We show that as we increase the semantic information utilized and as we increase the tightness of integration between lexical and semantic items, we obtain improved performance when interpolating with class language models, indicating that the two types of models become more complementary in nature

Sabanci University Research Database